Abstract: This paper introduces implementation CBGPA (Cluster Based Parallel Genetic Algo) [1] for simplified large data on Hadoop Map Reduce. Hadoop is a framework used for processing large amount of data in a parallel and distributed manner .Its provides the reliability in storing the data and efficient processing system. The two main gears of Hadoop are the HDFS (Hadoop Distributed File System) and Map Reducing (for processing). Map Reduce is a programming model which enables parallel processing in a distributed environment. Classify similar objects under the same group called cluster. Metaheuristic techniques, such as Genetic Algorithms (GAs) [2], It is one important data mining methods constitute the best alternative to find near-optimal solutions for such problems within a reasonable execution time and limited resources. To improve efficiency better approach is used called Map Reduce for Parallelization Genetic Algo(MRPGA)[3][4] by using the features of Hadoop. An analysis of proposed Algo CBPGA to evaluate performance gains with respect to the current algo MapRedue Word Count [5]. Our proposal aim is to evaluate both the parallel algo are compared based on speedup the no of processing node on different size of text files and find the solution within a reasonable time. Parallel implementation of the CBPGA algorithm makes the algorithm faster and scalable in order to find the optimal solutions while working with large data cluster in a parallel manner.
Keywords: Big Data, word count, hadoop, mapreduce, cluster ,Parallel Genetic Algorithm.